home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Gold Medal Software 2
/
Gold Medal Software Volume 2 (Gold Medal) (1994).iso
/
comms
/
uu21.arj
/
UU.DOC
< prev
Wrap
Text File
|
1994-01-17
|
28KB
|
509 lines
UU version 2.1 -- A small, fast, and smart uudecoder
(C) January 1994 -- ir.drs. B.J. Walbeehm
January 17, 1994
Introduction
~~~~~~~~~~~~
Until further notice, whenever I have a new version of UU, I shall upload it
to the FTP site wuarchive.wustl.edu (directory: /pub/MSDOS_UPLOADS/uucode), as
well as to the alt.binaries.pictures.misc and alt.binaries.pictures.utilities
newsgroups on USENET.
UU is a freeware program; please read the file INFO.TXT for more information
on what I mean by this. If the file INFO.TXT was not included in your UU
package, then you can obtain it by e-mailing (which is preferred), writing,
or calling (which is least preferred) me; the addresses may be found at the
end of this file. In short, the only thing I ask from you when you decide
that this program is of use to you, is that you send me an e-mail.
I have written this program primarily for my own convenience; the first time
I downloaded (a lot of) uuencoded files from the USENET binaries, it took me
over four hours to edit everything in such a way that the only uudecoder I
had then (a very naive one) could process them. That was a once-but-never-
again experience.
Starting with this program, I have broken with my rule to write programs that
run even on an 8086 based machine. The reason is that (as I said) I write my
programs first and foremost for myself, and since I "never" use an 8086 ...
But I can easily convert this program to an 8086 compatible version, and on
popular demand, I may even be willing to do this. Just let me know if you
desperately want an 8086 compatible version. For all clarity: UU version 2.1
requires an 80286 or higher.
I have not yet figured out what the minimal DOS version is that this program
requires. (I am currently using MS-DOS 6.20, and I do not have versions of
MS-DOS lying around lower than 5.00.) Anyway, I am quite sure that UU also
runs on "very low" DOS versions. I learnt that there still are people using
an 8088 based machine ... are there actually still people using, say, MS-DOS
3.00 or below? Or are these versions extinct?
As for memory requirements: The amount of RAM free for executables should be
at least 65k (UU uses two 28k buffers to speed up reading and writing) for
this program to work correctly. UU will check if there is enough RAM free,
and complain if there is not. (I hear some people asking: "65k?" ... Yes,
I know we are talking .COM here, but that does NOT mean we are restricted to
64k now, does it?)
As with all the programs I write, a short usage message is included in UU.
This message may be displayed by entering either of the following three
commands:
UU /?
UU -?
TYPE UU.COM
Starting with version 2.0, UU no longer displays a usage message when one
merely enters "UU". The reason for this is that I think that one should never
get accustomed to invoking a program without parameters or switches just to
get help, for there are numerous programs that really do something then. In
fact, I have written a program ("REMDIR.EXE") that can (depending on whether
one really wants it to do what it does then) have disastrous effects then.
What I am trying to say is: Never rely on a program to give you help by
invoking it without any parameters or switches ...
On the uuencoding standard
~~~~~~~~~~~~~~~~~~~~~~~~~~
In my opinion, the uuencoding standard is not very well thought-out. As long
as an encoded file consists of only one section (in the early days, splitting
an encoded file up into more than one section was most probably not allowed),
there is not much wrong with the standard, but as soon as the necessity rose
for files to be split up, the standard should have been changed as well.
To start with, there is no standard way of designating non-section parts,
so the standard provides us with no means whatsoever to distinguish between
encoded sections and mere comments. Also, the standard does not describe a
way of deciding which sections belong together, nor in which order. Most
uuencoders put such additional information in the files, but with the lack
of a standard, almost every single one of them has its own way of doing this.
A number of encoders will also put one or more checksums in the file, but
again, this has not been standardised. It would have been very easy to devise
a standard for adding such additional information, but it has not been done,
and it may be far too late now ...
Command line parameters and switches
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Although the usage message says "UU [drive:][path]filename[.ext] [/I] [/S]",
UU allows all kinds of variations on this: Instead of a slash ("/"), a dash
("-") is accepted as well. UU of course accepts both uppercase and lowercase,
and ignores irrelevant blanks (spaces). Also, using a switch twice or more
has the same effect as using it only once. Moreover, switches (currently, the
switches are "I" and "S") may be combined, and the order in which the filename
and the switches (if any) appear on the command line is irrelevant. This means
that, for instance, all of the following commands are treated identically:
UU example.uue /I /S
UU example.uue -I -S
Uu exAmplE.Uue/s -I
uu/s example.uue/i
uu example.uue -is
uu /is example.uue
uu example.uue /s/i
uu/i -sisssis example.uue
Please note that if the dash ("-") is used to precede a switch, it must be
preceded by at least one blank, since DOS allows dashes also to be part of
a filename (EXCEPT as the LEADING character of a filename). This means that
the following two commands are NOT identical:
uu temp-i
uu temp/i
The former command processes a file called "temp-i" using no switches, while
the latter will use the switch "i" on a file called "temp". So if the latter
interpretation is meant, and one wants to use the dash, then make sure that
at least one blank precedes it, as in:
uu temp -i
What UU does, and does not do
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Unlike what I have seen in some other uudecoders, UU does NOT assume an
extension of .UUE if no extension is given. (Let me know if this bothers
you.) This is for my own convenience, since most of the files I get to
process have no extension.
The current version of UU does not allow encoded files to be split up into
different files, so if, for example, a file called EXAMPLE.EXE has been
converted (by some uuencoder) into three sections, and each section has
been written to a different file, say EXAMPLE1.UUE, EXAMPLE2.UUE, and
EXAMPLE3.UUE, then UU will not be able to retrieve the original file.
Until I have implemented a multiple source handler, one can work around
this restriction by first executing the following command (still using the
same example) from the DOS prompt:
COPY /b EXAMPLE1.UUE + EXAMPLE2.UUE + EXAMPLE3.UUE EXAMPLE.TMP
and then feeding the resulting combined file EXAMPLE.TMP to UU. Note that the
/b switch has been added just to be sure; if the source files come directly
from a (any) uuencoder, then it will not be necessary, but in that case it
will not harm either. Some posting programs, however, put a CTRL-Z character
in the file, in which case the /b switch is absolutely required. Please note
also that if (and only if) the (here) three source files appear in increasing
order in the directory (so EXAMPLE1.UUE comes before EXAMPLE2.UUE, which in
turn comes before EXAMPLE3.UUE), that the following DOS command will correctly
combine them as well:
COPY /b EXAMPLE*.UUE EXAMPLE.TMP
The restriction of the files appearing in increasing order in the directory
when using the latter COPY command does usually not apply when UU is used in
its "unsorted sections" mode on the resulting file. For more information on
unsorted sections, see the appropriate chapter in this manual. Please note
that in order for the COPY command to work correctly, the resulting file
(EXAMPLE.TMP in the above examples) should have an extension that differs
from any of the files that are to be concatenated (the files ending in .UUE
in the above examples).
If no switches are used (and ONLY then), UU does not allow sections to be in
any other than increasing order in the file. (Please refer to the chapter on
unsorted sections for information on how to handle these.) In particular,
this means that this version acts the same as the earlier 1.x versions in
case no switches are used. In this mode, the 2.x versions are still as fast
as UU version 1.3 (which is the fastest of the 1.x versions), so even if one
never dealt with unsorted sections, then the only advantage of using version
1.3 would be its smaller size. One of the disadvantages of version 1.3 is
that it contains a small bug -- due to one (!) erroneous byte, it does not
allow the INPUT file to have a name of length 1.
UU always allows the source file to contain more than one uuencoded file, and
each of these files may consist of any number of sections. If no switches are
used, then these sections MUST be in the correct order. So in this case, a
file containing the following sections:
<file 1 part 1>
<file 1 part 2 (last part)>
<file 2 part 1>
<file 2 part 2>
<file 2 part 3 (last part)>
will be handled correctly by UU (and result in two files), whereas
<file 1 part 1>
<file 2 part 1>
<file 1 part 2>
<file 2 part 2>
<file 2 part 3>
and
<file 1 part 2>
<file 1 part 1>
<file 2 part 1>
<file 2 part 3>
<file 2 part 2>
will not. Again, this restriction does NOT apply when UU is told that the
file may contain unsorted sections.
When used in the "sorted order" mode of operation, UU can handle any number of
sections contained in one input file; there is no limit. The only thing that
may happen (apart from your hard disk getting full), is that some of the
numbers that UU displays will not be correct, but this only happens if the
number of sections in one file exceeds 9999. (Yes, I know I used the number
65535 in a previous manual, but that was a mistake. That is what happens when
you socialise with computers too much.)
If the program terminates or aborts after having detected some error, an
ERRORLEVEL of 1 is returned; a successful termination results in ERRORLEVEL 0.
Some platforms do not have the restriction of filenames being only at most
8+3 characters long, so the filename in the header of the first section of
an encoded file may not be DOS-compliant. UU recognises this, and prompts
the user for a new filename.
If the filename for an encoded file already exists, the user is informed of
this, and may then choose to either overwrite the old file, or rename the new
one. At this point, CTRL-Break (and CTRL-C) may be used to abort the process.
As opposed to some other uudecoders, UU does not choke on CTRL-Z characters.
UU ignores lines that are not uuencoded, typically before and after sections.
I saw somewhere that a uudecoder written by someone else could be notified
that (for example) "---" is not a decodable line, as it seems that this line
is used as a cut line on several BBS systems. With UU, it is not possible to
designate such a non-decodable line ... merely because UU does not need that
information to determine that a given line is not to be treated as a uuencoded
line. UU uses four ways to determine whether a line is a mere comment or not,
and treats the line as an encoded line only if all four ways show it is not a
comment. These tests are partly performed simultaneously, and always in such a
way as to require hardly any additional time (e.g. when the data required for
a test is available due to some other action currently being performed).
Although UU is quite intelligent, it is possible to fool it, but I think that
this is purely academic, for the chances of it being fooled are astronomically
small (unless someone intentionally fooled UU). Even if one decoded hundreds
of thousands of uuencoded files, it would most probably occur not even once
that UU was fooled. And if it should ever occur that UU is fooled, then,
please, do not blame UU or me, but blame the one who invented the uuencoding
standard for not making it more strict. Or, put in another way: All uudecoders
can be fooled, but mine must be one of the most reliable ones as I can easily
show by a simple computation of probabilities. Of course, UU cannot perform
miracles, so if the uuencoded file is corrupt to begin with, UU will be
helpless too.
Handling unsorted sections
~~~~~~~~~~~~~~~~~~~~~~~~~~
UU can also handle files containing randomly ordered sections. For this mode
of operation, two switches are available: /I and /S. When invoked with /I only,
UU will scan the source file, and it will subsequently report what it has found
there, but it will not actually decode anything. When invoked with both /I and
/S (or any equivalent notation -- see the chapter on command line parameters
and switches), it WILL start decoding after having reported the information.
A less verbose, but equally efficient result is obtained by specifying only
the /S switch.
Although there is a maximum to the number of sections that UU can handle using
this "unsorted sections" mode of operation, this can hardly be considered a
restriction, since this maximum number is 434.
This mode of operation, however still very fast, is slower than the "sorted
order" mode. Just how much slower depends on the order in which the sections
appear. Worst case performance (in terms of speed) is when the sections appear
in reversed order; considerable gains may be achieved on systems using disk
caches and/or RAM drives.
Since the "sorted order" mode uses one very powerful assumption (viz. the
sections being in sorted order), whereas the "unsorted sections" mode can (at
best) only rely on whatever information it filters out of the source file, it
is possible for UU to obtain better results in the former mode. So I recommend
using the "sorted order" mode whenever one is sure that every section appears
in the correct order (which, as noted earlier, also is faster).
So how does UU obtain its information? The current version of UU recognises
more than fifteen different uuencoders and posting programs. (For the ease of
discussion, I shall use the term "uuencoders" when I mean "uuencoders and/or
posting programs" in the remainder of this manual.) As far as I know, these
mostly are uuencoders used on PCs and UNIX systems, but I'd rather wait with
listing the uuencoders it recognises until I have found out which ones most of
them are.
If it cannot recognise the uuencoders that were used, or if these have not
included all of the necessary information in the file, UU tries to use the
"Subject:" lines (if it finds any) that may be included if the file contains
postings from USENET. Instead of "Subject:" lines, some newsreaders produce
"Description:" lines; these are also supported by UU. In the remainder of this
manual, I shall no longer refer to "Description:" lines, but whatever holds
for "Subject:" lines, also applies to "Description:" lines.
If postings from USENET are used, I recommend NOT chopping off the headers
(and thus the "Subject:" lines) for a higher chance of success. "Subject:"
lines are used only if all else fails, because of the higher chance of these
containing errors. For instance, someone may have erroneously given a five part
file a subject line of "EXAMPLE.ZIP (4/6)" indicating that there are six parts.
But even when things like this happen, there is a good chance that UU will
successfully decode these files all the same. To end this subject (no pun
intended), some examples of "Subject:" lines, and how they will be processed
by UU:
- Subject: EXAMPLE.ZIP (4/6)
UU sees this as part four of a six part file called EXAMPLE.ZIP.
- Subject: PICTURE.GIF {Just another picture} [01/10]
As expected, UU will see this as part one of a ten part file called
PICTURE.GIF.
- Subject: Repost:AGAIN.EXE(Part3of20).Reposted on popular demand.
Yes, UU will assume it is dealing with part three of a twenty part file
called AGAIN.EXE.
- Subject: >FOOBAR.JPG (b/w) {Another picture} (part 3/5.
UU is not fooled by "(b/w)", nor by the ">"; it will correctly assume
this is part three of a five part file called FOOBAR.JPG.
- Subject: - FooBar.Jpg {Another picture /0 } part04 of5} (6 /w ).
Even this does not fool UU; it assumes to be dealing with part four of a
five part file called FooBar.Jpg. Moreover, UU will see this as a further
part of the same file as in the previous example.
Although these examples show that UU is quite "intelligent" while dealing with
these lines, I realise that my "Subject:" line parser still leaves room for
improvement. Either way, the name it finds in the "Subject:" line is not all
that important since the name of the file also appears in the header of the
first section of a uuencoded file. And most of the time (so even when it comes
up with false information from the "Subject:" line), it will yield a correct
result anyway.
And while on the subject of filenames: Most of the uuencoders also include
the filename at the start of each (so not only the first) section, one way or
another. For at least some of them, it may be the case that this name differs
from the one that is in the header of the first section. And of course, this is
also possible for the name UU filters out of the "Subject:" line. That is why,
when using the /I switch, UU will give two names for each section it finds.
The real name (i.e. the one from the header of the first section) is the one
that is NOT parenthesised. And although UU will display the names exactly as
they appear in the file, it will perform a case-insensitive comparison between
these names, thus making up for capitalisation inconsistencies by the person
who posted the file.
Also when using the /I switch, UU will give the section number and the total
number of sections for each section (as far as this could be determined of
course). This is displayed as in "(003/010)", which which would mean that
this section is part three of a ten part file. Whenever a number could not be
determined, "000" is printed instead. Finally (still when using the /I switch
only), UU displays some information on any section it will not be able to
process, as well as the reason for this.
The remainder of this chapter holds for both the /I and /S switches: Whenever
a filename that was encountered is longer than twelve characters, it will be
displayed to the first eleven characters only, with an asterisk (*) appended
to it. Of course, the full name will be displayed when prompting the user for
a new filename.
When UU has scanned the input file, it will list the names, and numbers of
sections of each COMPLETE file it has found. It also gives the total number
of sections it has found, the number of sections it could not identify, and
the number of sections that may be processed. Note that the latter number is
not necessarily the difference of the former two, because there are various
reasons that a section that WAS identified cannot be processed after all (for
example when there are other sections of the same file missing). The actual
reason will usually be given while using the /I switch.
I have done my very best to make UU as smart as possible, but as noted earlier,
due to the fact that the uuencoding standard is not strict enough, even the
most intelligent uudecoder may not be able to correctly figure everything out.
Let me end this chapter by quoting Nick Viner: "Of course some files which have
been split by hand and not labelled adequately will always defeat it!"
Plans for future versions of UU
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
With the exception of the first two points, the following plans do not have
high priority for me ... but I am open to suggestions, so if you have any
arguments in favour of any of these, or perhaps some new suggestions, please
let me know.
I can think of several ways of making UU even smarter. For instance, by adding
support for even more uuencoders (if I find any). Another option is to have UU
use the information it gathers but does not use so far, so as to have it make
its own assumptions about sections that could only be partially identified.
The latter case would then be as if UU said "These sections probably belong
together ... well, let's assume they do, and process them.". Finally, the
routine that deals with USENET's "Subject:" lines could be made yet a little
smarter.
Another option I plan to add, is to have UU be able to write every section
that has not been processed to a separate file. Related to this would be an
option to have UU output all non-encoded data.
I am considering having UU be able to handle files whose sections are not
all contained in one and the same file, so the PART1.UUE PART2.UUE PART3.UUE
scheme, but I should add that this does not have high priority, since I only
need this very rarely, and for these rare cases, I do not mind using the COPY
command first in order to put everything in one file.
As an alternative to the former, or even in addition to it, I may some day
have UU accept wildcards in the filename.
I am considering adding a switch (/d for instance) allowing one to have the
input file deleted after it has been SUCCESSFULLY processed. Again, this does
not have high priority for me, but on the other hand, it would be very easy
to add this. So anyone in favour of this is kindly requested to react. (People
who are against this option do not have to react I guess, because no one is
forced to actually use all UU's options.)
Some uuencoders put checksums in the files. I may have a future version of UU
be able to check these.
I may also write an also very fast, and even smaller uuencoder.
I may add a third option to UU in case a file already exists, viz. "skip",
which will allow the user to choose not to process this file, and continue
with the next (if any).
I may also add support for xxencoded files to UU.
Someone suggested it would be nice if one could change UU's defaults, so that,
for example, the /S switch would then be assumed automatically. I do not like
to do this, since it would make using UU less easy. I think that naive users
would be frightened by the prospect of having to edit some configuration file
(or something like that) first. Moreover, I think typing "UU/S" instead of
"UU" cannot be a real bother. Or stated differently: If I had given this
program a longer name, then those extra characters would have to be entered
anyway.
Acknowledgements
~~~~~~~~~~~~~~~~
I should like to thank the following persons:
- Terry O'Brien for sending me detailed information on the file mode code
in the header of uuencoded files, and on uuencoding in general.
- Martin (sorry, don't know your last name) from Nottingham (?) for telling
me about the bug :-( in version 1.1 (and 1.0).
- Brian Norris for telling me about the bug :-( in version 1.3 (and earlier
versions).
- Douglas Swiggum for all the trouble taken in sending me "strange" uuencoded
files, and detailed descriptions of what happened. You have saved me a lot
of time in finding two bugs :-( in version 2.0!
Last but not least, I should like to thank all the people who have let me
know they appreciate my program, or otherwise (e.g. by telling me about bugs)
mailed me regarding UU.
Release history
~~~~~~~~~~~~~~~
In my convention of version numbers, 0.x versions denote usually unreleased
prototype versions.
Versions 0.1 through 0.4, and 0.6 were private, unreleased versions, written
in a mixture of Pascal and Assembly-language.
Version 0.5 was given to but a few people to see how they liked it. It had
resulted from a process of stepwise refinement in which speed, size, feedback,
and user-friendliness were tackled. Versions 0.1 through 0.5 were all written
on 11-Dec-93. They were EXE files, and the latter had a size of 5872 bytes.
UU 0.6 Type: EXE Size: 3424 Date: 14-Dec-93
The last prototype version. Most of it written in assembly. Yet a bit
faster than 0.5.
UU 1.0 Type: COM Size: 1993 Date: 15-Dec-93
The first publicly released version. But for some tiny details this is
the full-assembly version of 0.6.
UU 1.1 Type: COM Size: 1965 Date: 18-Dec-93
Even smarter in distinguishing comment lines from encoded lines (a fourth
test has been added). Sections containing only one non-empty line are now
recognised as such. Detects when the disk is full, upon which it aborts
with an appropriate message. Yet a bit faster than 1.0.
UU 1.2 Type: COM Size: 1896 Date: 23-Dec-93
Now really only accepts "y", "Y", "n", and "N" while asking permission to
overwrite an existing file. Also, CTRL-Break (and CTRL-C) can be used at
this point to abort the program immediately.
UU 1.3 Type: COM Size: 1892 Date: 25-Dec-93
In earlier versions, lines of more than 255 characters COULD (although it
is HIGHLY improbable they actually WOULD) result in decoded files being
corrupted; starting with this version, this can no longer happen. Yet a
bit faster than 1.2 (amongst others (but not only!) because the read and
write buffers now each are 4k larger).
UU 2.0 Type: COM Size: 5866 Date: 09-Jan-94
Now also allows files containing unsorted sections. An intelligent command
line parser has been added. Because of this, the bug of UU not accepting
filenames of length 1 in the command line (in fact, I did not even know
about this bug until some time after I had finished the parsing routines)
no longer exists. Aborts with an appropriate message if there is not enough
(conventional) RAM free. Displays an error message if invoked without any
parameters or switches.
UU 2.1 Type: COM Size: 6257 Date: 17-Jan-94
I really thought I had solved the problem of lines containing more than
255 characters in version 1.3, but I had not; now, it is REALLY fixed.
Added support for five more uuencoders and posting programs, as well as for
"Description:" lines. Made the parser for "Subject:" (and "Description:")
lines even more intelligent. Fixed a bug that seemed to matter only when
run from the DOS box under Windows. The maximum number of unsorted sections
UU can handle is slightly higher. Some minor changes not worth mentioning.
Contacting the author <-- Hey, that's me! :-)
~~~~~~~~~~~~~~~~~~~~~
Contact me (preferably using e-mail) if you have any questions, suggestions,
remarks, etc., on this document, on UU, or on any other of my programs.
Also, if you find a valid uuencoded file that UU does not process correctly,
please let me know. And if at all possible, pray send that file along to me
(or otherwise a detailed description of its contents), preferably in some
(any) compressed form in order to keep my mail server from automagically
ruining it. Beyond my control, my mail server automatically decodes (or tries
to anyway) uuencoded files, so I would not end up with your uuencoded file.
Thank you very much!
I check the alt.binaries.pictures.misc and alt.binaries.pictures.utilities
newsgroups on USENET regularly, so you could also try placing messages for
me there. Finally, please send me an e-mail if you think my program is of
use to you (or flame me if you think it is useless). If I do not get enough
feedback, I take it that people are not interested, and I shall ... continue
writing programs for myself, but DIScontinue spreading them on anything but
a very small scale.
Ben Jos Walbeehm (Please get my first name right, it is "Ben Jos".)
Lijsterbeslaan 20
5248 BB Rosmalen
The Netherlands
Phone : +31 4192 14345 (The best time (GMT) to get hold of me is at night!)
E-mail: Walbeehm@fsw.ruu.nl